Increasing Public Data Transparency for Immigration Law in Canada

Author

Ismail (Husain) Bhinderwala, Jessica Yu, Ke Gao, Yichun Liu

Published

May 4, 2025

1 Executive Summary

This project addresses the lack of transparency and accessibility in Canadian immigration inadmissibility decisions by asking: How can data science methods uncover patterns or biases in IRCC rulings and support legal advocacy? Inadmissibility decisions have lasting effects on individuals and families, yet the underlying data is poorly documented and difficult to analyze. Partnering with Heron Law Offices, which represents clients affected by these decisions, we propose a data-driven solution. We developed a comprehensive data pipeline that includes a data quality report to assess IRCC data reliability, an LLM-based extraction system to structure information from court rulings, and curated data stories to highlight recurring themes. Our analysis applies statistical inference and natural language processing to uncover patterns and possible biases. These insights are made accessible through an interactive dashboard, enabling legal professionals to explore the data and support clients more effectively.

2 Introduction

2.1 Problem Overview

Inadmissibility decisions issued by Immigration, Refugees and Citizenship Canada (IRCC) can have severe, long-lasting consequences for individuals and their families including denial of entry or removal from Canada. Yet the data underlying these decisions is often fragmented, inconsistently structured, and inaccessible to the legal professionals who most need it. This lack of transparency makes it difficult to assess systemic fairness, identify regional or racial disparities, or support legal and policy advocacy.

2.2 Why This Matters

Our project partner, Heron Law Office, frequently represents individuals facing inadmissibility and litigates immigration decisions in federal court. The firm has identified several concerns, including:

  • The potential role of geopolitical influence (e.g., cases from Ukraine, Syria, China, Iran)
  • Patterns suggesting anti-Black and anti-African bias, reflected in disproportionately high dismissal rates from African and Caribbean countries
  • The surge in Mandamus applications by Chinese nationals in recent years
  • Gaps and inconsistencies in the released data that limit legal interpretability and policy reform.

These challenges are compounded by IRCC’s growing reliance on Advanced Analytics for decision-making, which further raises the stakes for transparency and oversight.

2.3 Project Motivation

Our project aims to improve data accessibility, clarity, and interpretability by applying data science methods to immigration-related datasets. Specifically, we translate broad legal concerns into tractable data problems, with the goal of surfacing meaningful patterns and equipping legal advocates with structured, evidence-based insights.

2.4 Data Sources

We worked with the following datasets:

  • A34(1) Refusals Dataset (2019–2024): Refusals by country, inadmissibility grounds, and status.
  • Litigation Applications Dataset (2018–2023): Case types, leave decisions, and decision offices in federal court filings.
  • Refugee Law Lab Legal Text Dataset (2001–2024): Full-text federal court decisions used for NLP analysis.

2.5 Our Solution

To address the legal partner’s needs and highlight the most relevant data narratives, our proposed solution includes four integrated components:

  • Data Stories
    A curated narrative across three themes:
    • Grounds for refusal by year and country.
    • Regional differences in litigation outcomes.
    • Country-level litigation patterns for Nigeria, India, Iran, and China.
  • Interactive Dashboard
    • A34(1) Refusals: Filter by year, country, and grounds.
    • Litigation Applications: Navigate trends by case type, leave decision, and region.
  • LLM-Based Pipeline
    A legal text processing pipeline using LLaMA-3 and regular expressions to extract:
    • Judge names, decision outcomes, cities.
    • Case classification (e.g., inadmissibility vs. others).
  • Data Quality Report
    A structured evaluation identifying:
    • Missing disaggregations (e.g., country-level variables in refusals).
    • Inconsistencies in naming, case categorization, and court metadata.
    • Recommendations for improving future transparency and open data releases.

2.6 Technical Contribution

Our team applied data science expertise to:

  • Conduct exploratory and inferential analysis (e.g., chi-square tests on litigation outcomes by country).
  • Develop reproducible pipelines for data cleaning and legal text processing.
  • Build an accessible, advocacy-oriented tool that aligns with Heron Law’s public interest goals and pushes for more equitable and transparent immigration systems.

This work supports legal professionals, researchers, and policymakers in better understanding Canadian immigration decisions and in advocating for the fairness and openness that the system demands.

3 Data science methods

This section outlines our complete data science workflow, from initial exploration to final deployment. Our methodological decisions were motivated not only by technical feasibility, but also by client needs, data limitations, and legal-ethical considerations. We approached this work iteratively, continuously refining methods and outputs in response to stakeholder feedback.

3.1 Environment & Reproducibility

We used Conda to create a reproducible environment for all stakeholders. Conda was selected over Docker to reduce complexity while still ensuring consistent package versions across platforms. Environment files were shared with the partner to support replication and sustainability of the project.

3.2 Data Cleaning & Preprocessing for IRCC Datasets

  • All missing values were systematically identified and flagged for review.
  • Country names were standardized (e.g., “Democratic Rep. of Congo” vs. “Congo, Democratic Republic of the”) to ensure consistency during aggregation.
  • Inadmissibility refusals were sparse and unevenly distributed, with most countries reporting fewer than 50 cases across all years. This led us to favor exploratory analysis over statistical inference for this dataset.

3.3 Unstructured Text Processing & LLM Pipeline

To process Canadian Federal Court decisions (2014–2024) curated by the Refugee Law Lab, we implemented a multi-stage pipeline that combines rule-based methods (Regex) with large language models (LLMs) for zero-shot classification and attribute extraction. This hybrid approach was necessary to handle the linguistic complexity and semi-structured nature of legal decisions.

Figure 1: LLM-based multi-stage pipeline for legal decision processing.

As shown in Figure 1, the pipeline consists of multiple stages, integrating both deterministic and AI-driven components.

3.3.1 Preprocessing & Filtering

  • Temporal scope: Retained decisions from 2014 to 2024.
  • Language normalization: Removed duplicate French texts when English equivalents were present, identified via normalized citation strings.
  • Topical filtering: Selected cases mentioning immigration-related authorities (e.g., “Citizenship and Immigration,” “MCI”) and excluded refugee protection claims using regex patterns.
  • Inadmissibility focus: Isolated cases with terms such as “inadmissible” or “inadmissibility.”

3.3.2 Inadmissibility Classification

  • Used regular expressions to detect direct citations of IRPA sections (e.g., “s.34”, “section 36”).
  • Applied semantic fallback patterns (e.g., “espionage”, “indictable offence”) to classify when statutory references were missing.
  • Defaulted to an “other” category for ambiguous or unclassifiable cases.

3.3.3 LLM-Based Extraction & Validation

We deployed LLaMA 3 via Ollama, executed locally, to extract decision-level metadata using targeted prompts:

  • Judge name: Extracted from the top 30 lines.
  • City of hearing: Parsed from lines 10–25.
  • Decision outcome: Retrieved from the final 20–50 lines.

We chose LLMs over traditional NLP pipelines (e.g., spaCy + rule-based parsers) because the latter struggled with variability in legal writing and required extensive manual rule-crafting. LLMs offered greater generalizability, especially for under-structured decisions. However, LLMs introduce risks of non-determinism and hallucination, which we mitigated through prompt tuning and manual validation.

Future work could explore fine-tuning a domain-specific LLM or using ensemble prompts for better robustness. However, these approaches require significant compute resources and labelled data, which were outside the scope of this project.

3.3.4 Manual Review & Reproducibility

  • Manual validation by our partner for few cases reported ~85% accuracy for classification and attribute extraction.
  • Recognized challenges included:
    • Non-determinism in LLM outputs unless sampling parameters were fixed.
    • Prompt sensitivity, where small changes in phrasing affected results.
    • Hardware variability, impacting LLM behavior and output timing.

Despite these limitations, the LLM pipeline significantly outperformed rule-based methods alone and was essential for scaling the extraction of structured insights from noisy legal text.

3.4 Exploratory Data Analysis & Story Development

For the first four weeks, we engaged in exploratory data analysis (EDA) to identify emergent patterns and formulate hypotheses. Key early findings included:

  • A sharp rise in Ukrainian inadmissibility refusals in 2024 (from 5 to 134 cases)
  • A similar increase for Syria in 2022 (from 9 to 46 cases)
  • A marked increase in mandamus litigation by Chinese applicants in 2023

These findings were discussed and validated with the partner, who confirmed their alignment with court-level observations. Based on this discussion, we prioritized stories on:

  • Geopolitical shifts (e.g., Ukraine, Syria, China)
  • Anti-Black/African bias (e.g., high dismissal rates for African and Caribbean countries)
  • Missing disaggregated data, especially by office or demographic attributes

To improve narrative delivery, we iteratively redesigned visualizations based on best practices from sources such as Storytelling with Data and SAGE Research Methods. Simple charts evolved into layered, interpretable data stories with guided narratives.

3.5 Statistical Analysis

We used chi-square tests as they are robust for categorical comparisons without requiring distributional assumptions. While logistic regression could provide finer-grained estimates, we avoided it due to low sample sizes and missing covariates. Two hypotheses were tested for the top 4 countries (Nigeria, Iran, India, China):

  1. The distribution of case types is independent of country.
  2. The distribution of decision outcomes is independent of country.

Both hypotheses were rejected with significance (p < 2.2e-16), suggesting that applicant nationality is associated with different litigation experiences. However, these results are:

  • Descriptive, not causal
  • Limited to countries with sufficient sample sizes
  • Unable to control for confounders due to lack of joined data or demographic fields

3.6 Visualization & Dashboard Design

We developed an interactive dashboard using Streamlit and Plotly, composed of:

  • Three data story pages (A34(1) Refusals, Litigation Outcomes, Country-Level Trends)
  • Two interactive explorers (Litigation and A34 (1) Refusals)

Design priorities included:

  • Accessibility for non-technical users
  • Clear, annotated trends and visual narratives

3.7 Evaluation Metrics

Component Metric / Method Notes
Statistical Testing p-value from Chi-Square Tests Used only where assumptions were met
LLM Pipeline Accuracy (Manual Validation) Partner-verified; ~85% on early sample
Data Quality Missing Value Rate, Consistency Checks Shared as part of data quality report

3.8 Methodological Limitations & Assumptions

  • Data Sparsity: Many variables (e.g., refusals by country-year-ground) had too few entries (<5) for valid statistical testing.
  • No Causal Inference: All findings are observational. Due to missing contextual variables, no claims of discrimination or intent can be robustly made.
  • Unjoined Datasets: Key datasets (e.g., litigation + refusal) could not be linked at case level, limiting multivariate analysis.
  • Office-level Analysis: More than half of cases lacked decision office information, making regional bias analysis incomplete.
  • LLM Variability: Extraction outputs varied by run; reproducibility was partially mitigated through seeding and manual verification.

3.9 Stakeholder & Ethical Considerations

Our work supports immigration lawyers, legal advocacy groups, and policymakers. Ethical risks included:

  • Overinterpretation of sparse data
  • Biases in LLM extraction
  • Privacy risks from court text fields

To mitigate these:

  • We documented all data limitations explicitly
  • Provided caveats and disclaimers within the dashboard
  • Avoided drawing conclusions where statistical evidence was insufficient

4 Data Product and Results

Our data product was designed to address the dual priorities of our legal advocacy partner: (1) identifying trends in inadmissibility and litigation that may reflect systemic bias or inconsistencies, and (2) enabling future research and policy work grounded in high-quality legal data. To meet these goals, we delivered an integrated data product composed of: a Data Quality Report, Curated Data Stories, Interactive Exploratory Dashboards, and a LLM-based Pipeline for extracting metadata from unstructured legal decisions.

These components work together to create a reusable foundation for empirical legal research, evidence-based advocacy, and internal capacity-building for legal professionals.

4.1 Data Quality Report

The Data Quality Report was developed to provide transparency into the structure and limitations of the two IRCC datasets, A34(1) Inadmissibility Refusals and Litigation Appplication Decisions. Rather than treating the datasets as complete, we identified critical gaps so the partner could better interpret patterns and advocate for improved data transparency from IRCC.

Purpose & Use Case

This report is intended to help the partner challenge the evidentiary value of government data when it is incomplete or inconsistently recorded. For instance, undocumented decision offices or missing filing dates directly affect the ability to analyze regional bias or litigation timelines.

Examples of Key Gaps

  • Most primary decision-office fields were missing, limiting regional analysis.
  • A34(1) refusal data lacked case counts for permanent residents in several years.
  • The litigation dataset omitted filing dates restricting temporal and procedural analysis.

Limitations and Reflections

While we cannot fix these omissions, documenting them is itself valuable: it allows legal stakeholders to contextualize quantitative findings and identify where further qualitative or legal review is needed. However, IRCC may not acknowledge or rectify these issues, so our product avoids overpromising and instead offers cautious interpretation of patterns based on known limitations.

4.2 Curated Data Stories

We designed three static, narrative-style data stories that present insights in a structured and accessible way. These stories were developed in collaboration with our partner and validated through regular review meetings to ensure they reflect legal relevance.

Why Stories?

Curated stories serve advocacy needs more effectively than open-ended plots: they are shareable, interpretable, and focused on narratives the client already cares about (e.g., potential systemic bias, country-specific litigation trends). They also help prevent overinterpretation of weak signals in the data, especially given the quality issues flagged in our report.

Story Overviews & Client Relevance

  1. A34(1) Refusals by Country and Ground
    • Insight: A34(1)(f), was the most cited inadmissibility ground, with Ukraine and Syria showing major spikes as seen in Figure 2
    • Why it matters: Highlights geopolitical patterns that may warrant further scrutiny.
(a) Heatmap of A34(1) refusals by country and year (Top 5 countries).
(b)
Figure 2
  1. Litigation Outcomes by Region and Country
    • Insight: Nigeria dominated RAD appeals; Iran had a spike in visa refusals post-2020 as shown in Figure 3
    • Why it matters: Supports country-specific advocacy or policy submissions based on litigation profiles.
Figure 3: Litigation outcomes by case type and country for selected regions (China, India, Iran, Nigeria).
  1. Litigation Trends Over Time
    • Insight: Steady increase in federal litigation since 2018, dip in 2020 (COVID), then strong rebound as seen in Figure 4
    • Why it matters: May reflect broader procedural shifts, backlogs, or access-to-justice barriers.

4.3 Interactive Exploratory Dashboards

To complement the curated stories, we built two interactive dashboards using Streamlit and Plotly. These tools allow legal professionals or researchers to filter by case type, country, refusal ground, or time range, revealing tailored views that support investigative or legal work.

Use Case & Justification

While curated stories convey known patterns, exploration empowers discovery, especially for legal practitioners preparing country reports, litigation strategies, or academic research. Interactivity is essential for users who wish to dig deeper into their own areas of interest.

Examples

  • A lawyer representing a Syrian client could use the A34(1) explorer to review trends in inadmissibility refusals.
  • A researcher could filter the litigation dashboard to compare decision outcomes for Iranian vs. Indian applicants across years.

Design Priorities

  • Clean layout with minimal cognitive load
  • Hover tooltips and annotations for interpretability
  • Filters for key attributes (country, year, ground, etc.)

You can see the snapshot of the dashboard in Figure 5 with the above mentioned design priorities being shown.

Figure 5: A snapshot of the Interactive Dashboard

Limitations & Risks
While exploration is valuable, it can also lead to misinterpretation of noisy or sparse data. To mitigate this, we embedded tooltips and footnotes throughout the dashboard to warn users of known data quality concerns. We also advise our partner to treat these tools as starting points for inquiry, not sources of final proof.

4.4 LLM-Based Metadata Extraction Pipeline

As a final component, we built a local LLM pipeline using LLaMA 3 (via Ollama) to extract structured metadata from unstructured court decisions. This supports long-term capacity building: once new decisions are released, the partner can re-run the pipeline to keep datasets updated.

Why LLMs?

Manual extraction is not scalable, and rule-based methods alone failed to handle the complexity of legal language. LLMs enabled robust extraction of attributes such as decision outcome, judge name, and city of hearing even in the absence of strict formatting.

Pros

  • Enables timely updates to litigation datasets
  • Achieved ~85% accuracy in manual partner validation
  • Generalizable across years and decision formats

Limitations

  • Non-determinism without prompt engineering and seeding
  • Hardware-dependent reproducibility
  • Errors in edge cases (e.g., multiple judges, ambiguous outcomes)

4.5 Future Improvements and Alternatives

Alternative Products Considered

  • A Shiny app could offer finer-grained UI, but would increase deployment complexity and cost.
  • Fully supervised LLM classification for decision outcomes would improve accuracy, but requires hand-labelled data and compute resources beyond current scope.

Improvement Areas

  • Add summaries or trend annotations to dashboard outputs
  • Add hoverable definitions or explanations for complex terms (e.g., “mandamus”, “RAD decisions) to support users unfamiliar with legal jargon.
  • Improve LLM prompt design to enhance determinism (e.g., use explicit output templates, enforce structure) and reduce prompt sensitivity.

5 Conclusions & Recommendations

5.1 Reframing the Problem

The central question driving this project was: How can data science tools uncover patterns in Canadian inadmissibility and litigation processes to support legal practitioners, advocates, and policy actors in promoting transparency and fairness? Our partner sought scalable methods to analyze immigration trends and decision-making, identify systemic disparities, and advocate for data-informed reforms.

5.2 How the Data Product Addresses This Need

Our solution combined Data Quality Report, Curated Data Stories, Interactive Exploratory Dashboards, and a LLM-based Pipeline which creates a foundation for understanding legal and bureaucratic patterns in immigration enforcement. The product supports:

  • Case-level research: The LLM pipeline extracts structured metadata (e.g., outcome, judge, location) from unstructured Federal Court decisions, enabling pattern identification across thousands of rulings.
  • Systemic insights: Curated data stories highlight geopolitical shifts (e.g., Ukraine, Syria), disparities in inadmissibility decisions, and litigation volume trends.
  • Accountability advocacy: The data quality report identifies gaps (e.g., decision office, refusal grounds) that can guide FOI requests or policy dialogue with IRCC.

This layered approach balances narrative clarity with exploratory flexibility, and was validated through continuous partner feedback.

5.3 Key Findings

  • Inadmissibility refusals showed major spikes for Ukraine (2024) and Syria (2022), primarily under s.34(1)(f), with higher rates for permanent residents.
  • Litigation case volumes rose significantly post 2021, with country-specific trends, Nigeria in RAD appeals, Iran in visa refusal challenges.
  • A34 refusals and litigation outcomes differ by country, supported by chi-square tests, though confounding factors cannot be ruled out.
  • Legal text extraction using LLMs achieved ~85% accuracy in pilot tests, enabling scalable, semi-automated analysis.

5.4 Limitations

  • LLM Outputs: Classification and extraction remain probabilistic and unverified without expert legal review.
  • Data Quality: Key fields (e.g., decision authority, application counts, office metadata) were incomplete or inconsistently formatted.
  • Unlinked Datasets: Inability to join litigation and refusal datasets at the case level limited our ability to model decision pathways or systemic bias robustly.

5.5 Recommendations

  1. Human-in-the-loop validation: Future use of the LLM pipeline should incorporate expert oversight to verify extracted features and minimize error propagation.
  2. Targeted data advocacy: The partner can use our findings to advocate for more complete and structured public releases by IRCC especially fields like approval rates, decision authority, and refusal reasoning.
  3. Sustainability practices: Maintain reproducible environments and prompt logs to ensure transparency as the project scales or is handed off to other teams.

5.5.1 Final Reflection

This project demonstrates how data science can meaningfully support legal transparency and advocacy but also reveals the fragility of analysis when working with fragmented public data. Our work provides a prototype for responsible, reproducible legal data analysis, and underscores the need for ongoing collaboration between data scientists and legal experts to ensure interpretations remain grounded, ethical, and actionable.